-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flannel hang if lease expired #1334
Conversation
The problem is that two goroutines share events via a chan, if the goroutine which reads from chan terminates before the one that writes to it, the later one would have a change to hang on writing to chan. |
n.handleSubnetEvents(evtBatch) | ||
|
||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this, there is a chance this for
loop ends before subnet.WatchLeases
goroutine terminates, then no one will read events from evts chan, subnet.WatchLeases
goroutine may happen to be writting to it. And it results in subnet.WatchLeases
hangs forever on receiver <- batch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenchun Looks good. But I am not sure why you want to remove the context close event catching.
@chenchun Do you mind joining the Community Meeting one of the times and pursue this PR? We are still not sure on ignoring cancelled contexts. |
I'd like to, but it seems the next meeting is two weeks later.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Description
Flannel sometimes hang when lease expired. I checked goroutines, it hangs on https://github.com/coreos/flannel/blob/v0.12.0/subnet/watch.go#L59 .
I can't share the goroutine screenshot because I'm using a modified version of flannel.
Todos
Release Note